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In recent years, cross spectral matching has been gaining attention in various 
biometric systems for identification and verification purposes. Cross spectral 
matching allows images taken under different electromagnetic spectrums to 
match each other. In cross spectral matching, one of the keys for successful 
matching is determined by the features used for representing an image. 
Therefore, the feature extraction step becomes an essential task. Researchers 
have improved matching accuracy by developing robust features. This paper 
presents most commonly selected features used in cross spectral matching. 
This survey covers basic concepts of cross spectral matching, visual and 
thermal features extraction, and state of the art descriptors. In the end, this 
paper provides a description of better feature selection methods in cross 
spectral matching. 
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1. INTRODUCTION 

Cross-spectral matching also known as cross-spectrum matching is a matching process between two 
images taken on different electromagnetic spectrum [1]. Typically, we cross-matched the thermal spectrum 
images with visible light (VL) images. The thermal spectrum consists of four sub-bands, i.e.. Near Infrared 
(SWIR), Short Wave Infrared (SWIR), Medium Wave Infrared (MWIR), and Longwave IR (LWIR) [2]. The 
study in the cross-spectral domain increases rapidly in line with applications of biometric systems [3]. Cross- 
spectral matching is widely used for security, national ID programs, also for personal identification and 
authentication using iris and face recognition [4-7]. By using cross-spectral image matching scheme, the 
identification and authentication process becomes more accurate and efficient because it utilises the 
additional information contained in both different spectrum and wavelength [8]. 

The performance of cross-spectral matching is dependent on features that can represent information 
from VL and thermal images. VL and thermal images represent information from the same subject even 
though the visual appearance and structure of the two images are different. Therefore the most challenging 
task in cross-spectral image matching is how to choose a representative feature for both VL and thermal 
images [9]. Various features are employed in cross-spectral image matching with high 
recognition performance. 

Trokielewicz [10] developed cross spectral mobile based verification using the Discrete Cosine 
Transform (DCT) and Gabor Wavelet features. DCT is used on Monro Iris Recognition Library (MIRLIN) 
software, whereas Gabor Wavelet is used on Open Source for IRIS (OSIRIS) software. DCT and Gabor 
wavelet features are suitable for cross spectral iris recognition with low Equal Error Rate (EER). Abdullah et 
al. [11] explored the Binarized Statistical Image Features (BSIF) to extract the statistical features from NIR 
and VL iris images. BSIF features able to represent the statistical properties of NIR and VL iris images 
appropriately with high iris recognition performance. Another interesting work was conducted by klare and 
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Jain [12] exploiting Histogram of Oriented Gradient (HOG) integrated with Local Binary Pattern (EBP) to 
describe the structure of the face in NIR and VL images. Experimental result showed that HOG and EBP 
have better performance in representing NIR and VIS images. 

A survey on state of the art feature in cross-spectral image matching is carried out in this paper. We 
describe several features based on the representation of VL and thermal image features. The main 
contribution of this paper is to provide a brief description about features in cross-spectral image matching so 
that it can help in selecting the most suitable feature used in the face and iris recognition. This paper is 
organized as follows. Section 2 describes basic background concepts in cross-spectral matching and the 
feature extraction, such as the definition of cross-spectral image matching, and block diagram matching 
process in the cross-spectral domain. Section 3 reviews the properties and characteristics of the VL and the 
thermal image features. Also, how to extracts features from the VL image and the NIR images for cross 
spectral matching are described in section 3. Section 4 reviews the robust feature used in cross-spectral 
matching including the feature extraction process. Conclusions are summarized in section 5. 


2. CROSS SPECTRAL MATCHING 

Cross spectral matching represents the ability to recognize the objects presented in two different 
modalities. Cross spectral matching is illustrated in Eigure 1. Eirst, pre-processing was applied to prepare an 
image for further processing. The pre-processing step comprises image cropping, photometric and geometric 
normalization, and restoration. Next step is feature extraction which is carried out by taking unique features 
of thermal and VL image by using certain descriptors. The results of this process are used as inputs to the 
matching step using matching or classification algorithms. 
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Eigure 1. Cross spectral matching 


3. VISIBLE AND THERMAL IMAGES FEATURES 

Image features are unique attributes of both numerical and alphanumerical that represent 
information about the content of a digital image. Numerical representation means any physical information of 
an image, such as heat, color, pressure, range, non visible wavelength, etc. replaced by numerical values that 
make it easier to be analyzed in various applications. In VL and thermal images, a feature can be either an 
image visual characteristics or interpretative response to the spatial, symbolic, semantic, or emotional image 
characteristics [13]. 

Feature extraction is an initial stage for all applications in image analysis. Feature extraction is 
known as the process of getting the unique characteristics of an image that distinguish one image with 
another. Feature extraction is used to indicate the relevant information of an image in completing the 
computational tasks associated with a particular application [14]. Feature extraction process for VL and 
thermal images are described in Figure 2. 

In cross spectral matching, performance and matching accuracy depends on proper feature 
extraction. VL and thermal images are retrieved based on the value of the vector feature, therefore, feature 
selection and feature extraction process are important to be considered. Feature selection is defined as a 
process of selecting the best features that suitable for a particular application [15]. 

The feature selection based on the visual properties of the image is shown in Table 1. Low level is 
defined as the image features can be extracted directly without considering the spatial relationship. While 
high-level concerns the spatial information. The feature selection process must consider [16]: 
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a. Similarities between two matched images, if the images similar the feature distance between this image 
is small. We can confirm that if VL and thermal images are similar the distance of feature vector of 
those two image is small. 

b. Computational task is not complex. 

c. Small feature dimensions do not affect the matching efficiency. 


Visible 

Images 


Thermal 

Images 






Figure 2. Representation of image features 


Table 1. Visual properties of visible and thermal images 


Low level 

Middle level 

High level 

Color 

Regions 

Localization 

Shape 

Spatial relationship 

Image categorization 

Texture 


Objects identification 


d. The size of the dataset does not affect matching performance. 

e. Robust against variation illuminations and geometric transformation. 

Figure 3 presents several methods of feature extraction classification for cross spectral matching. 
Feature extraction methods divided into: 

a. Structural methods : this method identifies structural features of an image. Feature calculated based on 
topological and geometric properties [17]. 

b. Statistical methods : identifies statistical features of an image based on statistical distributions of 
pixels [17]. 
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c. Spectrum or transform based : feature calculated over spectral composition, low and high frequencies of 
an image. They can further be devided into : spatial, frequencies, and combined transform [14]. 

d. Block based : based on images block. This method combines the important feature of image into one 
block, consequently size and number of blocks affect the resulting features [14]. 

e. Combined methods : feature calculated by concatenated several individual methods to provide greater 
ability [14]. 



Fourier 

Transform 


Radon 

Transform 


Eigure 3. Classification of features extraction methods for cross spectral image 


4. STATE OF THE ART FEATURE DESCRIPTORS 

In this section, we present some popular descriptors used in cross spectral matching. These 
descriptors frequently use because robust in describing VL and thermal image. The descriptors commonly 
used in the cross spectral matching are presented in Table 2. 

4.1. Difference of Gaussian Filter 

The Difference of Gaussian (DoG) is a bandpass filter that used Gaussians filter to produce a 
normalized image [2]. DoG can suppress variations of noise and aliasing on the cross-spectral image caused 
by the frequency difference between VL and thermal images. Also, the DOG has a low computational 
complexity and popularly used in publications [18], [19]. 

DOG can accommodate difficult lighting condition by setting inner dan outer Gaussian values. Eor 
strong lighting variations datasets, the recognition gives the best result at outer Gaussian ~ 2 pixels and up to 
about 4 for datasets less extreme lighting variations while the inner Gaussian is quite narrow at 1 pixel [20]. 

Band-pass filter is obtained by subtracting two Gaussians with different a that eliminating all 
frequencies between frequencies cut-off of the two Gaussian. The image feature is located between this 
frequency band was extracted. 

The DOG feature extractions consists of three stages [20]: 

a. The input image is convolving with two Gaussian kernels having differing standard deviations as shown 
in (1) to produce a blurred image. 

b. Next two blurred images version is subtracted each other to obtain a normalized image. 


Features for Cross Spectral Image Matching: A Survey (Maulisa Oktiana) 







556 n 


ISSN: 2302-9285 


c. To construct a band-pass filter, the values of 6o must be smaller than 6i which set to (1) and 
(2) respectively 

Dix,y\8o8i) = [G(x,y|5o) - G(x,y|5i)] * I(x,y) (1) 

Where : I (x, y) : original image 
G (x, y ) : blurred image 
a : Gaussian kernel function defined as : 

( 2 ) 


Table 2. Various descriptors in cross spectral matching 


Publications 

Domain Modality 

Applications 

Feature Extraction Technique 

[2] 

NIR vs. VL 

Iris recognition 

Real valued -Log Gabor Phase 

[4] 

NIR vs. VL 

Face recognition 

LBP, DLBP 

[10] 

NIR vs. VL 

Iris recognition 

DCT, Gabor wavelet 

[11] 

NIR vs. VL 

Iris recognition 

Binarized Statistical Image Feature 

[12] 

NIR vs. VL 

Face recognition 

uniform LBP and HOG 

[16] 

NIR vs. VL 

Face recognition 

Log DoG LBP and Log DoG HOG 

[19] 

NIR vs. VL 

Face recognition 

DoG, Center Surround Divisive Normalization 
(CSDN), SIFT (Scale Invariant Feature Transform), 
and Multiscale Local Binary Pattern (MLBP) 

[21] 

NIR vs. VL, SWIR 
vs. VL 

Face recognition 

LBP and 

Generalized LBP (GLBP) 

[22] 

SWIR vs. VL 

Face recognition 

LBP and LTP 

[23] 

NIR vs. VL 

Face recognition 

Directional binary code 

[24] 

Thermal vs. VL 

Face recognition 

Pyramid Histogram of Oriented Gradient (PHOG), 
PSFIT 

[25] 

NIR vs. VL 

Face recognition 

Multi resolution LBP 

[26] 

MWIR vs. VL 
LWIR vs. VL 

Face recognition 

Canonical Correlation Analysis (CCA) 

[27] 

NIR vs. VL 

Face recognition 

Wavelet transform 

[28] 

NIR vs. VL 

Face recognition 

Sparse dictionary 

[29] 

SWIR vs. VL 

Face recognition 

Gabor filter, LBP, Simplified Weber Local 
Descriptor (SWLD) and GLBP 

[30] 

Thermal vs. VL 

Face recognition 

Deep Perceptual Mapping (DPM) 

[31] 

SWIR vs. VL 

Face recognition 

Contrast Limited Adaptive Histogram Equalization 
(CLAHE) 

[32] 

NIR vs. VL 

Face recognition 

Gaussian filter and SIFT 


4.2. Local Binary Pattern 

At 1996, Local Binary Pattern (LBP) was first introduced by Ojala et al. [33]. The LBP operator is 
a texture descriptor gray-scale invariant that analyzes the texture of an image based its texture spectrum 
called Texture Unit (TU). 

Texture spectrum is a distribution of texture units happening in a region. Originally, LBP uses 3x3 
neighborhood and generates 8-bit code based on the number of 8 pixels around the center pixel. Texture unit 
has 28=256 possibility histogram bin in describing the spatial pattern in a 3x3 neighborhood. They compute 
by multiplying the weights of the corresponding pixel with the values of the pixels in the previous threshold 
neighborhood. Then the pixel value of this neighborhood summed resulting the number of (169) texture unit 
[34]. The LBP operator defined as: 

LBP{Xc, yc) = 1.1=0 2"s(i„ - Q (3) 
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Where s(u) = ^ ■ 

10, otherwise 

c, n represents a central pixel and 8 neighbors of the central pixel respectively. 

The 3x3 neighbor's pixel is thresholding by center pixel. If the neighbor's pixel greater than center 
pixel then the value is 1, and 0 otherwise. EBP operators further developed to accommodate variations of 
texture scaling into circular neighborhoods (8.1), (16.2) and (8.2). 


4.3. Binary Statistical Image Feature 

Binary Statistical Image Eeature (BSIE) is a texture descriptor which is inspired by Local Binary 
pattern (EBP). BSIE uses a binary code to represent each pixel neighborhood in an image [35]. The binarized 
feature is generated by convolving image with a set of linear filter. Thus, the response of a linear filter is 
binarized with a threshold at zero. To construct the linear filter, independent component analysis (ICA) is 
used. The statistical independence of the filter responses is maximizing by ICA from a training set of natural 
image patches. The BSIE extraction process as described in [36]: 

a. Eirst, the response of the linear filter is constructed. Let X is image patch with size I x I pixels. Wi is a 
linear filter and 5^ represents the response of the filter. 


=T.y^i(u,v)X(u,v) = Wi^x 


b. Binarized feature bi is obtained by : 


bi = 


1, if Si > 0 
0, otherwise 


( 4 ) 


( 5 ) 


4.4. Scale Invariant Feature Transform 

The Scale Invariant Feature Transform (SIFT) is a robust descriptor that combinations of DoG 
interest region detector with a corresponding feature descriptor. SIFT encodes image information in a 
localized set of gradient orientation histograms. Thus, SIFT can accommodates illumination variations and 
small positional shift [37]. 


4.5. Histogram of Oriented Gradient 

Histogram of Oriented Gradient (HOG) also known as histogram normalization due to its ability to 
normalize from local responses [38]. HOG is adopted human detection and used for object detection 
applications. HOG resistant of shadowing process, illumination invariant, and reliable to photometric 
variations. HOG does not have spatial image representation, therefore, HOG only computes the individual 
pixel energy regardless spatial distribution of an image. The feature extraction process [39]: 

a. The image is divided into a small area called cell (8x8 pixels). 

b. Calculate the magnitude of the pixel orientation. 

c. Interpolation of result no 2 into histogram orientation bin 20 degrees. 

d. Cells are grouped into overlapping blocks. 

e. Then overlapping blocks are normalized. 

f. Finally the normalized histograms are concatenated resulted in vector features. 


4.6. Additive Fusion Block Based 

Additive fusion block based proposed by Varadarajan et al. that used block-based extraction process 
[40]. Block based can maintain the optimum number of features that need to be extracted. Size and number of 
blocks affect the resulting features in this techniques. The more blocks, the less feature is extracted because 
of, the smaller block size causing the resultant block reduction. The ideal block sizes are 4,8, or 16 pixels. 
Feature extraction process consists: 

a. Image is divided into individual blocks of size 4.8, or 16 pixels 

b. Each block is applied Chirp Z-Transform (CZT) and Goertzel algorithm for preprocessing and image 
enhancement. 

c. Individual blocks are summed to yield a single resultant block that contains the essential features of 
each block. 

d. This resultant block is then become a vector feature. 


4.7. Discrete Cosine Transform 

Discrete Cosine Transform (DCT) transforms spatial image information into frequency domain. 
DCT consists cosine part of Fourier transform models [41]. DCT concentrates energy image into some DCT 
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coefficients (energy compaction). Signal energy is concentrated on large DCT coefficient magnitudes (called 
low frequencies) and located in the upper-left corner of the DCT array. 

Low frequencies coefficients contain most of an essential image information. Therefore the original 
image can be reconstructed only by using low frequencies coefficients. While less important information is 
located in lower-right values of the DCT array (called high frequencies). This high-frequency coefficients 
can be discharged through the quantization process without significantly affecting the image quality [42-43]. 
DCT feature extraction process [44]: 

a. An original image is divided into 8x8 blocks. 

b. Calculated DCT coefficients using (4) resulted in DCT coefficient arrays. 

c. Quantized.the DCT coefficients. 

d. The value that is in upper-left corner of the DCT is used as a feature vector (low frequency). 


N-l M-1 

(u,v) = a(u)a(v)^ ^ (2x + l)] cos (2y + l)]/(x,y) 

x=0 y=0 


a(u)a(v) = 



u,v = 0 

It, 17 ^ 0 


( 6 ) 


Where u = 0,1,, N — 1 ; i; = 0,1,, M — 1 and f(x,y) represents intensity of the pixel in row x and 
column y [45]. 


4.8. Radon Transform 

Radon transform is one of the transformations that can enhance low-frequency components by 
describing the integral line of an image. Radon transform is turned rotation into translation and often used in 
face recognition. Radon feature extraction process [46] : 

a. Calculated radon space using (5). 

b. Discriminative information located in radon space which computes using projections for 0°-179° 
orientations. 

c. Calculated DCT of radon space to obtain feature frequencies. 

d. Concatenated 25% of DCT coefficients resulted in feature vector. 

R(r, 0)[f(x,y)] = — xcosO — ysin6)dxdy (7) 

Where cr (.) represents Dirac function, perpendicular distance of a line from the origin represented r 
g[— 00 , oo] and 0 is an angle between X-axis and distance vector. 


5. CONCLUSIONS 

The literature relevant cross spectral matching is overgrowing, and many researchers have proposed 
a cross spectral matching framework with better matching performance. This survey presents an overview 
feature of thermal and visible images. Also, brief descriptions the current state of the art feature extraction 
methods for cross spectral matching. The image features affect the performance of cross spectral matching. 
Therefore, the selection of feature extraction methods that suitable for an application becomes essential issue. 
Also, the researchers must consider visual properties of VL and thermal images. 
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